-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-allocating trim method #247
base: master
Are you sure you want to change the base?
Conversation
add additional tests for trim
add testcase that does actual trimming work
Thank you for this pull request. This yields a significant performance improvement in our project. @BurntSushi what would be the next steps to consider this PR? |
I need to review it. From a quick glance, it looks like there's some clean up required here. For example, there shouldn't be a "trim original" routine in the public API. CI also obviously needs to be fixed. |
@do4gr hello, thank you at lot for this. Do you plan to work on this PR again? Otherwise I can try to polish what you did in order to get it merged. |
I'll remove the trim_original and then we'll see whether the PR run succeed. I think it was just flakey CI. |
Apart from that from my side the PR would be done. There is nothing more I was thinking of including. |
That's it from my side. Let me know if there is more that needs to be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this PR.
for element_end_idx in &mut self.0.bounds.ends[..length] { | ||
// left trim | ||
let mut found_only_whitespace = true; | ||
while read_pos < *element_end_idx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest finding the first non-whitespace position first and then using slice.copy_within to copy the rest. That would (a) make this somewhat easier to read (e.g. obviate the need for found_only_whitespace) and (b) be potentially easier to optimize for the compiler (no need to deduce that once the branch with the copy is taken it will be taken until the loop finishes).
Hey, thanks for the library.
When I was using it I saw that a lot of time was spent in the trim method and stumbled over the comment about making it non-allocating.
I took a stab at it and cargo bench shows a big improvement. Let me know if that is something that you would be interested in , than I can keep working on it a bit more.
Below is the output of the included bench functions for trimming. I think they are showing the best case though since the test file seems to contain no trimmable white-space.
Also what is confusing to me is that the StringRecord benchmark also benefits from improvements to the ByteRecord trim method, not sure where this is coming from.